Skip to content

[ZEPPELIN-3986]. Cannot access any JAR in yarn cluster mode#3308

Closed
zjffdu wants to merge 1 commit into
apache:masterfrom
zjffdu:ZEPPELIN-3986
Closed

[ZEPPELIN-3986]. Cannot access any JAR in yarn cluster mode#3308
zjffdu wants to merge 1 commit into
apache:masterfrom
zjffdu:ZEPPELIN-3986

Conversation

@zjffdu
Copy link
Copy Markdown
Contributor

@zjffdu zjffdu commented Feb 16, 2019

What is this PR for?

User specified jars is missing in yarn-cluster mode due to we didn't detect the user jar correctly. This PR fix the detecting jar logic in BaseSparkScalaInterpreter.

What type of PR is it?

[Bug Fix]

Todos

  • - Task

What is the Jira issue?

How should this be tested?

  • System integration test is added into SparkIntegrationTest, we tested the case of spark.jars & spark.jars.packages

Screenshots (if appropriate)

Questions:

  • Does the licenses files need update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No


InterpreterResult result = interpreter.interpret("import com.databricks.spark.avro._", getInterpreterContext());
assertEquals(InterpreterResult.Code.SUCCESS, result.code());
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not test user jar case in unit test (MutableURLClassLoader only exist in real spark app which is launched via spark-submit) . So remove it here.

Copy link
Copy Markdown
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is kafka-clients-0.11.0.3.jar in resource? I think we need to avoid binaries in repo (esp this is a big one)

.filter { u => u.getProtocol == "file" && new File(u.getPath).isFile }
// Some bad spark packages depend on the wrong version of scala-reflect. Blacklist it.
.filterNot { u =>
Paths.get(u.toURI).getFileName.toString.contains("org.scala-lang_scala-reflect")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this configurable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User jar is configurable via spark.jars or spark.jars.packages. Here's the internal mechnism at runtime of detecting what user jars user has been specified

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the org.scala-lang_scala-reflect

also indentation seems off

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Feb 17, 2019

@felixcheung
Copy link
Copy Markdown
Member

felixcheung commented Feb 17, 2019 via email

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Mar 2, 2019

@felixcheung I have removed the kafka jar and use the zeppelin-interpreter-integration jar

.filter { u => u.getProtocol == "file" && new File(u.getPath).isFile }
// Some bad spark packages depend on the wrong version of scala-reflect. Blacklist it.
.filterNot { u =>
Paths.get(u.toURI).getFileName.toString.contains("org.scala-lang_scala-reflect")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the org.scala-lang_scala-reflect

also indentation seems off

private void testInterpreterBasics() throws IOException, InterpreterException, XmlPullParserException {
// add jars & packages for testing
InterpreterSetting sparkInterpreterSetting = interpreterSettingManager.getInterpreterSettingByName("spark");
sparkInterpreterSetting.setProperty("spark.jars.packages", "com.maxmind.geoip2:geoip2:2.5.0");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use something in org.apache.zeppelin instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may already shipped to spark driver and put under classpath. So it is better to use some other libraries.

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Mar 5, 2019

Will merge if no more comments

@asfgit asfgit closed this in 78c49e9 Mar 5, 2019
@maziyarpanahi
Copy link
Copy Markdown
Contributor

Hi @zjffdu,
Is this being also added to 0.8.2 or just the master?

Many thanks.

@zjffdu
Copy link
Copy Markdown
Contributor Author

zjffdu commented Jun 9, 2019

Both master & 0.8.2

@maziyarpanahi
Copy link
Copy Markdown
Contributor

Thakns @zjffdu,
The reason I asked was that it still only works if I mention the JARs in spark.jars in UI but not with

export SPARK_SUBMIT_OPTIONS="--jars "

I clone the latest git and checkout to 0.8 so it has this pull request merged already.

@maziyarpanahi
Copy link
Copy Markdown
Contributor

On top of this, spark.jars.packages also doesn't work. It shows the JARs in UI but in import, it can't find them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants